{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 两独立样本比较的秩和检验\n",
    "\n",
    "## 案例\n",
    "\n",
    "观察有无淋巴结转移胃癌患者的生存时间，探究生存时间与有无淋巴结转移是否相关。\n",
    "\n",
    "### 数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (24, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>IID</th><th>time</th><th>metastasis</th></tr><tr><td>i64</td><td>i64</td><td>bool</td></tr></thead><tbody><tr><td>1</td><td>12</td><td>false</td></tr><tr><td>2</td><td>25</td><td>false</td></tr><tr><td>3</td><td>27</td><td>false</td></tr><tr><td>4</td><td>29</td><td>false</td></tr><tr><td>5</td><td>38</td><td>false</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>20</td><td>30</td><td>true</td></tr><tr><td>21</td><td>34</td><td>true</td></tr><tr><td>22</td><td>36</td><td>true</td></tr><tr><td>23</td><td>40</td><td>true</td></tr><tr><td>24</td><td>48</td><td>true</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (24, 3)\n",
       "┌─────┬──────┬────────────┐\n",
       "│ IID ┆ time ┆ metastasis │\n",
       "│ --- ┆ ---  ┆ ---        │\n",
       "│ i64 ┆ i64  ┆ bool       │\n",
       "╞═════╪══════╪════════════╡\n",
       "│ 1   ┆ 12   ┆ false      │\n",
       "│ 2   ┆ 25   ┆ false      │\n",
       "│ 3   ┆ 27   ┆ false      │\n",
       "│ 4   ┆ 29   ┆ false      │\n",
       "│ 5   ┆ 38   ┆ false      │\n",
       "│ …   ┆ …    ┆ …          │\n",
       "│ 20  ┆ 30   ┆ true       │\n",
       "│ 21  ┆ 34   ┆ true       │\n",
       "│ 22  ┆ 36   ┆ true       │\n",
       "│ 23  ┆ 40   ┆ true       │\n",
       "│ 24  ┆ 48   ┆ true       │\n",
       "└─────┴──────┴────────────┘"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import polars as pl\n",
    "\n",
    "df = pl.read_csv(\"B_10_4-data.csv\", schema_overrides={\"metastasis\": pl.Boolean})\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "影响变量为二项分类变量，结果变量为多项有序分类变量，两总体相互独立，因而采用两独立样本的秩和检验。这里采用 Wilcoxon rank sum test.\n",
    "\n",
    "### 假设\n",
    "\n",
    "$ H_0 $: 有、无淋巴细胞转移患者的生存时间的总体分布相同。  \n",
    "$ H_1 $: 有、无淋巴细胞转移患者的生存时间的总体分布不同。\n",
    "\n",
    "### 假设检验"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Statistic: -2.1664981619457437\n",
      "P-value: 0.030273140162198255\n",
      "Significance: True\n"
     ]
    }
   ],
   "source": [
    "from scipy.stats import ranksums\n",
    "\n",
    "res = ranksums(\n",
    "    df.filter(pl.col(\"metastasis\").eq(True)).select(\"time\"),\n",
    "    df.filter(pl.col(\"metastasis\").eq(False)).select(\"time\"),\n",
    ")\n",
    "\n",
    "print(f\"\"\"Statistic: {float(res.statistic.sum())}\n",
    "P-value: {float(res.pvalue.sum())}\n",
    "Significance: {res.pvalue.sum() < 0.05}\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "P < 0.05, 在 $ \\alpha = 0.05 $ 的检验水准下，拒绝 $ H_0 $，接受 $ H_1 $，可以认为有、无淋巴细胞转移患者的生存时间的总体分布不同。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
